6 research outputs found
Drawbacks and Proposed Solutions for Real-time Processing on Existing State-of-the-art Locality Sensitive Hashing Techniques
Nearest-neighbor query processing is a fundamental operation for many image
retrieval applications. Often, images are stored and represented by
high-dimensional vectors that are generated by feature-extraction algorithms.
Since tree-based index structures are shown to be ineffective for high
dimensional processing due to the well-known "Curse of Dimensionality",
approximate nearest neighbor techniques are used for faster query processing.
Locality Sensitive Hashing (LSH) is a very popular and efficient approximate
nearest neighbor technique that is known for its sublinear query processing
complexity and theoretical guarantees. Nowadays, with the emergence of
technology, several diverse application domains require real-time
high-dimensional data storing and processing capacity. Existing LSH techniques
are not suitable to handle real-time data and queries. In this paper, we
discuss the challenges and drawbacks of existing LSH techniques for processing
real-time high-dimensional image data. Additionally, through experimental
analysis, we propose improvements for existing state-of-the-art LSH techniques
for efficient processing of high-dimensional image data.Comment: Accepted and Presented at the 5th International Conference on Signal
and Image Processing (SIGI-2019), Dubai, UA
qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
Similarity search queries in high-dimensional spaces are an important type of
queries in many domains such as image processing, machine learning, etc. Since
exact similarity search indexing techniques suffer from the well-known curse of
dimensionality in high-dimensional spaces, approximate search techniques are
often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be
an effective approximate search method for solving similarity search queries in
high-dimensional spaces. Often times, queries in real-world settings arrive as
part of a query workload. LSH and its variants are particularly designed to
solve single queries effectively. They suffer from one major drawback while
executing query workloads: they do not take into consideration important data
characteristics for effective cache utilization while designing the index
structures. In this paper, we present qwLSH, an index structure for efficiently
processing similarity search query workloads in high-dimensional spaces. We
intelligently divide a given cache during processing of a query workload by
using novel cost models. Experimental results show that, given a query
workload, qwLSH is able to perform faster than existing techniques due to its
unique cost models and strategies.Comment: Extended version of the published wor